Sound stream segregation: a neuromorphic approach to solve the “cocktail party problem” in real-time
نویسندگان
چکیده
The human auditory system has the ability to segregate complex auditory scenes into a foreground component and a background, allowing us to listen to specific speech sounds from a mixture of sounds. Selective attention plays a crucial role in this process, colloquially known as the "cocktail party effect." It has not been possible to build a machine that can emulate this human ability in real-time. Here, we have developed a framework for the implementation of a neuromorphic sound segregation algorithm in a Field Programmable Gate Array (FPGA). This algorithm is based on the principles of temporal coherence and uses an attention signal to separate a target sound stream from background noise. Temporal coherence implies that auditory features belonging to the same sound source are coherently modulated and evoke highly correlated neural response patterns. The basis for this form of sound segregation is that responses from pairs of channels that are strongly positively correlated belong to the same stream, while channels that are uncorrelated or anti-correlated belong to different streams. In our framework, we have used a neuromorphic cochlea as a frontend sound analyser to extract spatial information of the sound input, which then passes through band pass filters that extract the sound envelope at various modulation rates. Further stages include feature extraction and mask generation, which is finally used to reconstruct the targeted sound. Using sample tonal and speech mixtures, we show that our FPGA architecture is able to segregate sound sources in real-time. The accuracy of segregation is indicated by the high signal-to-noise ratio (SNR) of the segregated stream (90, 77, and 55 dB for simple tone, complex tone, and speech, respectively) as compared to the SNR of the mixture waveform (0 dB). This system may be easily extended for the segregation of complex speech signals, and may thus find various applications in electronic devices such as for sound segregation and speech recognition.
منابع مشابه
Cocktail Party Processing
Speech segregation, or the cocktail party problem, has proven to be extremely challenging. This presentation describes a computational auditory scene analysis (CASA) approach to the cocktail party problem. This approach performs auditory segmentation and grouping in a two-dimensional time-frequency representation that encodes proximity in frequency and time, periodicity, amplitude modulation, a...
متن کاملRecovering sound sources from embedded repetition.
Cocktail parties and other natural auditory environments present organisms with mixtures of sounds. Segregating individual sound sources is thought to require prior knowledge of source properties, yet these presumably cannot be learned unless the sources are segregated first. Here we show that the auditory system can bootstrap its way around this problem by identifying sound sources as repeatin...
متن کامل27 The Correlative Brain: A Stream Segregation Model
The question of how everyday cluttered acoustic environments are parsed by the auditory system into separate streams is one of the most fundamental in perceptual science. Despite its importance, the study of its underlying neural mechanisms remains in its infancy; with a lack of general frameworks to account for both psychoacoustic and physiological experimental findings. Consequently, the few ...
متن کاملThe cocktail party problem
Natural auditory environments, be they cocktail parties or rain forests, contain many things that concurrently make sounds. The cocktail party problem is the task of hearing a sound of interest, often a speech signal, in this sort of complex auditory setting (Figure 1). The problem is intrinsically quite difficult, and there has been longstanding interest in how humans manage to solve it. The p...
متن کاملInterfacing Sound Stream Segregation to Automatic Speech Recognition - Preliminary Results on Listening to Several Sounds Simultaneously
This paper reports the preliminary results of experiments on listening to several sounds at once. Two issues are addressed: segregating speech streams from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition (ASR). Speech stream segregation (SSS) is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, an...
متن کامل